Add two implementations split10 and split11 by rep-movsd · Pull Request #6 · tobbez/string-splitting

rep-movsd · 2016-05-29T10:51:19Z

split10

This is a simple implementation that uses std::find()

It uses a for loop with two variables:
itStart - iterator start of a word
itDelimiter - iterator to a delimiter

We get the result of find() for the next delimiter in itDelim and then
save the range (itStart, itDelim) as a token, and emplace the string into
the result vector

This very simple 5 liner shows that good C++ code can beat C code without
any trouble

split11

Almost the same as split10 but we use a StringRef class just like the
one in split6. Instead of iterators we use a char pointer
This speeds it up by almost a factor of 80%

We get pretty close to the subparser version, only about 15% slower on my
system.

Benchmark results (trimmed output for 80 columns)

$ ./run_all.bash
=== System info
Arch rolling
Linux 4.5.5-1-ck x86_64 GNU/Linux
Intel(R) Core(TM) CPU X 920 @ 2.00GHz
g++ (GCC) 6.1.1 20160501
Python 3.5.1
=== End System info

./split.py Python: 38.9 seconds. Crunch Speed: 514293.2
./split5.py Python: 41.0 seconds. Crunch Speed: 488168.6
./split1 C++ : 8.7 seconds. Crunch speed: 2288344.6
./split2 C++ : 20.9 seconds. Crunch speed: 958164.2
./split6 C++ : 3.6 seconds. Crunch speed: 5603155.4
./split7 C++ : 2.6 seconds. Crunch speed: 7750547.7
./split8 C++ : 31.0 seconds. Crunch speed: 644411.0
./split9 C++ : 21.1 seconds. Crunch speed: 949104.7
./split10 C++ : 3.7 seconds. Crunch speed: 5387448.0
./split11 C++ : 2.3 seconds. Crunch speed: 8703679.3
./split_subparser C++ : 2.0 seconds. Crunch speed: 9956735.4
./splitc1 C++ : 7.9 seconds. Crunch speed: 2519434.9
./splitc2 C++ : 8.0 seconds. Crunch speed: 2484935.2
./splitc3 C++ : 8.0 seconds. Crunch speed: 2515293.7
$

================================================== split10 --------- This is a simple implementation that uses std::find() It uses a for loop with two variables: itStart - iterator start of a word itDelimiter - iterator to a delimiter We get the result of find() for the next delimiter in itDelim and then save the range (itStart, itDelim) as a token, and emplace the string into the result vector This very simple 5 liner shows that good C++ code can beat C code without any trouble -------------------------------------------------------------------------- split11 ---------- Almost the same as split10 but we use a StringRef class just like the one in split6. Instead of iterators we use a char pointer This speeds it up by almost a factor of 80% We get pretty close to the subparser version, only about 15% slower on my system. -------------------------------------------------------------------------- Benchmark results (trimmed output for 80 columns) ------------------------------------------------- $ ./run_all.bash === System info Arch rolling Linux 4.5.5-1-ck x86_64 GNU/Linux Intel(R) Core(TM) CPU X 920 @ 2.00GHz g++ (GCC) 6.1.1 20160501 Python 3.5.1 === End System info ./split.py Python: 38.9 seconds. Crunch Speed: 514293.2 ./split5.py Python: 41.0 seconds. Crunch Speed: 488168.6 ./split1 C++ : 8.7 seconds. Crunch speed: 2288344.6 ./split2 C++ : 20.9 seconds. Crunch speed: 958164.2 ./split6 C++ : 3.6 seconds. Crunch speed: 5603155.4 ./split7 C++ : 2.6 seconds. Crunch speed: 7750547.7 ./split8 C++ : 31.0 seconds. Crunch speed: 644411.0 ./split9 C++ : 21.1 seconds. Crunch speed: 949104.7 ./split10 C++ : 3.7 seconds. Crunch speed: 5387448.0 ./split11 C++ : 2.3 seconds. Crunch speed: 8703679.3 ./split_subparser C++ : 2.0 seconds. Crunch speed: 9956735.4 ./splitc1 C++ : 7.9 seconds. Crunch speed: 2519434.9 ./splitc2 C++ : 8.0 seconds. Crunch speed: 2484935.2 ./splitc3 C++ : 8.0 seconds. Crunch speed: 2515293.7 $

Same as split11 except use memchr() Now this is as fast or faster than split_subparser -march=SSE2 and so on might be even faster

rep-movsd added 3 commits May 29, 2016 16:19

split1 - Make minor tweaks

59b1885

Add split12

2ddaba0

Same as split11 except use memchr() Now this is as fast or faster than split_subparser -march=SSE2 and so on might be even faster

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add two implementations split10 and split11#6

Add two implementations split10 and split11#6
rep-movsd wants to merge 3 commits intotobbez:masterfrom
rep-movsd:master

rep-movsd commented May 29, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rep-movsd commented May 29, 2016

split10

split11

Benchmark results (trimmed output for 80 columns)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant